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AN APPARATUS AND METHOD FOR DISTRIBUTING AND 
COLLECTING BULK DATA BETWEEN A LARGE NUMBER OF MACHINES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

Related subject matter may be found in the following commonly assigned, 
co-pending U.S. Patent Applications, which are hereby incorporated by reference 
herein: 

Serial No. (AT9-99-275), entitled "APPARATUS FOR DATA 

DEPOTING AND METHOD THEREFOR" ; 

Serial No. (AT9-99-276), entitled "APPARATUS FOR 

RELIABLY RESTARTING INTERRUPTED DATA TRANSFER AT LAST 
SUCCESSFUL TRANSFER POINT AND METHOD THEREFOR"; 

Serial No. (AT9-99-655), entitled "APPARATUS FOR 

CONNECTION MANAGEMENT AND METHOD THEREFOR" and filed 
concurrently herewith; 

Serial No. (AT9-99-324), entitled "COMPUTER NETWORK 

CONTROL SYSTEMS AND METHODS" and filed concurrently herewith; 

Serial No. (AT9-99-325), entitled "METHODS OF 

DISTRIBUTING DATA IN A COMPUTER NETWORK AND SYSTEMS USING 
THE SAME"; 
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Serial No. (AT9-99-3 15), entitled "SYSTEMS AND METHODS 

FOR REAL TIME PROGRESS MONITORING IN A COMPUTER NETWORK; 

Serial No. (AT9-99-3 16), entitled "APPARATUS FOR 

AUTOMATICALLY GENERATING RESTORE PROCESS DURING SOFTWARE 
DEPLOYMENT AND METHOD THEREFOR"; and 

Serial No. (AT9-99-323), entitled "AN APPARATUS FOR 

JOURNALING DURING SOFTWARE DEPLOYMENT AND METHOD 
THEREFOR". 

TECHNICAL FIELD 

The present invention relates generally to data processing systems, and in 
particular, to bulk data distributions within networked data processing systems. 

5 

BACKGROUND INFORMATION 

Present day data processing systems are often configured in large multi-user 
networks. Management of such networks may typically include the need to transfer 
large amounts data to an endpoint system from a source system (or, simply, "a 
10 source") and the collection of information, for example, error reports from a 

multiplicity of endpoints systems (or, simply, "endpoints"). 
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Such large data transfers may occur within a network, for example, to 
distribute software updates. The system administrator may need to allocate a specific 
period of time for the bulk data transfer to more efficiently utilize network resources. 
This may typically occur when the communication load on the system is lowest, 
5 usually at night when most endpoint users are not working at their stations. The 

system administrator may load the bulk data and the corresponding transfer 
instructions onto the network system's source, or server, in preparation for the 
transfer. At the predetermined time set by the administrator, the server will push the 
data while ensuring that the bulk data is successfully transferred to each of the desired 

10 endpoint locations. However, during the transfer a portion of the system server is 

dedicated to the data transfer and thus unavailable for other networking tasks. 
Moreover, as the number of endpoints which must be simultaneously serviced by the 
bulk data distribution increases, network bandwidth demands are concomitantly 
increased. This complicates scalability of the bulk distribution systems. 

1 5 Therefore, a need exists in the art for a bulk distribution mechanism that can 

transfer large amounts of data between network connected subsystems (or nodes) 
while maintaining scalability. Additionally, there is a need in such distribution 
mechanisms for methods and apparatus to distribute bulk data to a multiplicity of 
endpoints and to collect bulk data, including large log files, from the endpoints. 

20 
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SUMMARY OF THE INVENTION 

The aforementioned needs are addressed by the present invention. 
Accordingly, the present invention provides a general service that allows applications 
to asynchronously distribute large amounts of data from a source node, to multiple 
5 destination nodes, to process the data on each destination node, and then to collect the 

results on one or more "report-to" nodes. 

The present invention includes fan-out nodes (which will also be referred to as 
repeaters) and methods therefor, which are nodes on the network which receive bulk 
data streams, and retransmit the data to the follow-on fan-out nodes or to endpoints. 
10 Additionally, the fan-out nodes receive bulk results from downstream and retransmit 

them to upstream fan-out nodes or final report-to nodes. 

Additionally, the present invention includes a method and apparatus for 
enqueuing the distribution information received from a requesting application in a 
persistent queue at the repeaters according to its priority, and returns to the 
1 5 application and unique ID that can be used as a correlator for the results. 

The invention also provides for a different maximum number of available 
sessions according to a predetermined set of selectable transmission priority levels. A 
distribution with a given priority level can use the number of sessions reserved for its 
priority level plus any sessions allocated for lower priority levels. 
20 The foregoing has outlined rather broadly the features and technical 

advantages of the present invention in order that the detailed description of the 
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invention that follows may be better understood. Additional features and advantages 
of the invention will be described hereinafter which form the subject of the claims of 
the invention. It is important to note the drawings are not intended to represent the 
only form of the invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present invention, and the 
advantages thereof, reference is now made to the following descriptions taken in 
conjunction with the accompanying drawings, in which: 
5 FIGURE 1 illustrates, in block diagram form, a data processing network in 

accordance with one embodiment of the present invention; 

FIGURE 2 illustrates, in block diagram form, a data processing system 
implemented in accordance with an embodiment of the present invention; 

FIGURE 3 A illustrates, in flowchart form, a distribution request methodology 
1 0 in accordance with an embodiment of the present invention; 

FIGURE 3B illustrates, in tabular form, a distribution structure in accordance 
with an embodiment of the present invention; 

FIGURE 4A illustrates, in flowchart form, a methodology to transfer data in 
accordance with an embodiment of the present invention; 
1 5 FIGURE 4B is a continuation of FIGURE 4 A and illustrates, in flowchart 

form, a methodology to transfer data over a network in accordance with an 
embodiment of the present invention; 

FIGURE 5 illustrates, in flowchart form, a methodology implemented to 
determine priority resource availability in accordance with an embodiment of the 
20 present invention; and 
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FIGURE 6 illustrates in flowchart form, a database management methodology 
implemented in accordance with an embodiment of the present invention. 
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DETAILED DESCRIPTION 

The present invention is a method and apparatus for distributing and collecting 
data between an originating source system and a plurality of endpoint systems (which 
may also be referred to as "endpoint nodes" or simply "endpoints"). The method and 
apparatus provide a general service that allows applications to asynchronously 
distribute large amounts of data from a source node to multiple destination nodes, to 
process the data on each destination node, and then to collect the results of that 
processing on one or more report-to nodes. 

According to the principles of the present invention, the present invention has 
an originating source system followed by repeaters. The use of repeaters allows data 
to be delivered essentially simultaneously to a large number of machines. The present 
invention can be scaled to handle more destinations by adding repeaters. In the 
following description, numerous specific details are set forth to provide a thorough 
understanding of the present invention. However, it will be obvious to those skilled 
in the art that the present invention may be practiced without such specific details. In 
other instances, well-known circuits have been shown in block diagram form in order 
not to obscure the present invention in unnecessary detail. For the most part, details 
concerning timing considerations and the like have been omitted inasmuch as such 
details are not necessary to obtain a complete understanding of the present invention 
and are within the skills of persons of ordinary skill in the relevant art. 
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A more detailed description of the implementation of the present invention 
will subsequently be provided. Prior to that discussion, an environment in which the 
present invention may be implemented will be described in greater detail. 

FIGURE 1 illustrates a communications network 100. The subsequent 
5 discussion and description of FIGURE 1 are provided to illustrate an exemplary 

environment used by the present invention. 

The network system 100 includes source system 101, one or more fan-out or, 
repeaters 110, 111, 118, 119, and a plurality of endpoints 112-117. Additionally, 
certain repeaters, such as 1 18 and 1 19, are directly connected to one or more 
10 endpoints, in the exemplary embodiment of FIGURE 1, endpoints 1 12-1 14 or 

115-1 17, respectively, and may be referred to as "gateway" repeaters (or, simply, 
"gateways"). 

Source system 101 provides distribution services with respect to 
resources 1 12-1 17. Note that source system 101 and endpoints 1 12-1 17 interfaces to 

15 repeaters 110 and 1 1 1 using the same methodologies as repeaters 110 and 1 1 1 

interface with, for example, repeaters 1 18 and 1 19. Viewed logically, source system 
1 10 and endpoints 112-117 each may include a "repeater". In other words, as an 
artisan of ordinary skill would recognize, as used herein, a repeater may be a logical 
element, that may be, but is not necessarily associated with a physical stand-alone 

20 hardware device in network 1 00. Repeater 1 1 0 may be the primary repeater through 

which resources 112-114 receive their data transfers, and repeater 111, likewise, may 
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primarily service endpoints 115-117. Additionally, any report-back of successful 
transfers will be transmitted primarily via the endpoints primary domain except as 
explained below. It would be understood by an artisan of ordinary skill that 
additional repeaters may be inserted into the network and may be arranged in a 
multi-level hierarchy according to the demands imposed by the network size. 

Gateway repeaters 1 18 and 1 19 are such repeaters in the exemplary 
embodiment of FIGURE 1. 

However, network system 100 may provide cross connections in order to 
provide redundant, parallel communication paths should the primary communication 
path to the endpoint become unavailable. For example, in FIGURE 1, endpoint 1 14 
has a primary pathway to source system 101 through repeaters 1 18 and 1 10. (A 
source system, such as source system 101 may also be referred to as a source node.) 
Should repeater 110 become unavailable, source system 101 can transfer bulk data to 
endpoint 1 14 via an alternative pathway through repeaters 1 18 and 1 1 1. Additionally, 
should repeater 1 18 become unavailable, endpoint 1 14 may receive data via repeaters 
1 1 1 and 1 19. Source system 101 maintains database 120 for storing information used 
in managing a data distribution. A methodology which may be used to process the 
information to be stored in database 120 will be described in conjunction with 
FIGURE 6. 

Referring next to FIGURE 2, an example is shown of a data processing 
system 200 which may be used to implement a source system such as system 101, 
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repeaters, such as repeaters 1 10, 1 1 1, 1 18, or 1 19 or endpoints, such as endpoints 
112-117, executing the methodology of the present invention. The system has a 
central processing unit (CPU) 210, which is coupled to various other components by 
system bus 212. Read only memory ("ROM") 216 is coupled to the system bus 212 
and includes a basic input/output system ("BIOS") that controls certain basic 
functions of the data processing system 200. Random access memory ("RAM") 214, 
I/O adapter 218, and communications adapter 234 are also coupled to the system 
bus 212. I/O adapter 218 may be a small computer system interface ("SCSI") adapter 
that communicates with a disk storage device 220. Disk storage device 220 may be 
used to hold database 120, FIGURE 1 . Communications adapter 234 interconnects 
bus 212 with the network as well as outside networks enabling the data processing 
system to communicate with other such systems. Input/Output devices are also 
connected to system bus 212 via user interface adapter 222 and display adapter 236. 
Keyboard 224, track ball 232, mouse 226 and speaker 228 are all interconnected to 
bus 212 via user interface adapter 222. Display monitor 238 is connected to system 
bus 212 by display adapter 236. In this manner, a user is capable of inputting to the 
system throughout the keyboard 224, trackball 232 or mouse 226 and receiving output 
from the system via speaker 228 and display 238. 

Preferred implementations of the invention include implementations as a 
computer system programmed to execute the method or methods described herein, 
and as a computer program product. According to the computer system 
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implementation, sets of instructions for executing the method or methods are resident 
in the random access memory 214 of one or more computer systems configured 
generally as described above. Until required by the computer system, the set of 
instructions may be stored as a computer program product in another computer 
5 memory, for example, in disk drive 220 (which may include a removable memory 

such as an optical disk or floppy disk for eventual use in the disk drive 220). Further, 
the computer program product can also be stored at another computer and transmitted 
when desired to the user's work station by a network or by an external network such 
as the Internet. One skilled in the art would appreciate that the physical storage of the 

10 sets of instructions physically changes the medium upon which it is stored so that the 

medium carries computer readable information. The change may be electrical, 
magnetic, chemical, biological, or some other physical change. While it is convenient 
to describe the invention in terms of instructions, symbols, characters, or the like, the 
reader should remember that all of these and similar terms should be associated with 

1 5 the appropriate physical elements . 

Note that the invention may describe terms such as comparing, validating, 
selecting, identifying, or other terms that could be associated with a human operator. 
However, for at least a number of the operations described herein which form part of 
at least one of the embodiments, no action by a human operator is desirable. The 

20 operations described are, in large part, machine operations processing electrical 

signals to generate other electrical signals. 
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Refer now to FIGURE 3 A in which is illustrated a flow chart of a 
methodology 300 for processing distribution requests in accordance with an 
embodiment of the present invention. 

Applications resident on endpoint systems may request a distribution of bulk 
data. The application may, for example, maintain a periodic update schedule, or 
respond to a user update request. When an application requires updating, or the 
application otherwise requires a distribution of data, the application requests the data 
from the source system, step 301. The source system may be a server, such as 
server 110, FIGURE 1. 

Applications may submit a request by involving a predetermined method 
which is included in each repeater. In step 302, the request may be transmitted to the 
source system via one or more repeaters in similar fashion. The distribution request 
may be encapsulated in a data structure, which my be passed to the method. A data 
structure 315 which may be used in the present invention is schematically shown in 
FIGURE 3B. Data structure 315 includes entries 317, 319, 321 and 323. These 
entries respectively include a distribution data specifier which identifies the data to be 
distributed in response to the request and the location of the data, a list of destination 
node identifiers (IDs) that specify the endpoints that are to receive the data, the 
method on the endpoint that will receive and process the data, and the method that 
will receive and process results information from each endpoint node receiving the 
distribution. The method identifier in field 321 informs the source system of the 
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method on the endpoint system to be invoked to receive and process the data. As 
described further below, endpoints provide result information to the source system. 
The method identifier in field 323 informs the source system of the method on the 
source system that a repeater will invoke to receive and process the result information 
sent by the endpoint. In an embodiment of the present invention, this method may be 
implemented in accordance with the Common Object Request Broker Architecture 
(CORBA). The CORBA is specified in "The Common Object Request Broker: 
Architecture and Specification, " Version 2.3, June 1999, which is hereby incorporated 
herein by reference. 

In step 320, the request is received by the source system, and, in step 340, the 
source system enqueues the distribution information from the distribution structure in 
a database, such as database 120, FIGURE 1. The distribution information is 
enqueued in accordance with a preselected distribution priority which may, in an 
embodiment of the present invention, be one of three levels: high, medium, or low. 
The use of the priority schedule in transferring data is discussed in detail in 
conjunction with FIGURES 4A, 4B and 5. 

The source system provides the target endpoint(s) a distribution identification 
(DID) in step 350. The DID is used by the endpoints receiving the distribution to tag 
the results information, whereby the source system may correlate the results 
information when it is received from the endpoints. Additionally, the DID is used by 
the endpoints to check the data transmission as it is occurring as discussed in more 
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detail in the co-pending commonly-owned U.S. Patent Application entitled 
"Apparatus for Restarting Interrupted Data Transfer and Method Therefor, " 
incorporated herein by reference. The application can exit to allow the server and 
endpoints to perform other operations, step 370. The supplied notification method 
will be called by the source system to receive results information sent from each 
endpoint node. The distribution of the data is described in FIGURE 4A and 
FIGURE 4B. 

Refer now to FIGURE 4A and FIGURE 4B in which is illustrated a flow chart 
of methodology 400 for distributing and collecting bulk data between data processing 
systems. Methodology 400 may be implemented by a data processing network such 
as network 100, FIGURE L 

In step 401, data for distribution is loaded on to the source system, for 
example source system 101, FIGURE 1 . The administrator may allocate the 
appropriate endpoints to receive the data. Or, as described above, the appropriate 
endpoints may request the distribution based on applications running on the 
endpoints. Status information regarding the distribution is retained in a database 
which, in an embodiment of the present invention, may be maintained by the source 
system, such as source system 101, FIGURE 1, as previously described. In step 402, 
a database entry corresponding to the distribution loaded in step 401 is created. If the 
system administrator has identified target endpoints, these are stored in the database 
entry, which will incorporate status information for each distribution endpoint. 
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Additionally, endpoint destinations identified in distribution requests as described 
above in conjunction with FIGURE 3 will also be incorporated in the database entry 
for the distribution. In an embodiment of the present invention, employing a network 
such as network 100, FIGURE 1, the database entry may be included in database 120 
5 maintained by source system 101. 

In step 404, a connection is opened to a target repeater. The target repeater 
may be a gateway repeater, as discussed below in conjunction with step 410, or may 
be an intermediate repeater for fanning out the distribution to one or more gateways. 
In step 405, it is determined if a session is available. Network bandwidth 

10 management is implemented by allocating resources (referred to as sessions) for 

transferring data in accordance with a priority scheme in which a particular 
distribution is assigned one of a predetermined set of priority levels. A method and 
apparatus for connection management is described in detail in the corresponding 
commonly owned U.S. Patent Application entitled n An Apparatus for Connection 

1 5 Management and Method Therefor, 1 ' incorporated herein by reference. For each 

priority level a predetermined "pool" of resources, or sessions, for transferring data, is 
allocated and a distribution may use a session from the pool corresponding to the 
priority level of the distribution or from a lower priority level pool. That is, high 
priority pool may have j sessions allocated, a medium priority pool may have b 

20 sessions and a low priority pool may have / sessions. The availability of a session is 

determined based on the available bandwidth and the priority level of the distribution 
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as described in detail in conjunction with FIGURE 5. If in step 405 a connection is 
available, while the connection is established, step 411, data is transferred to the 
repeater, in step 406. The transfer data is stored on the repeater, step 408. The 
storage may be temporary or permanent in accordance with an explicit administrator 
command, or control information provided by an application requesting the 
distribution. The storage, or depoting, of data on a repeater is described in detail in 
the co-pending, commonly owned U.S. Patent Application entitled, "Apparatus for 
Data Depoting and Method Therefor," (Attorney Docket No. AT9-99-275), and 
incorporated herein by reference. 

In step 415, it is determined if all of the data constituting the distribution has 
been transferred. If not, methodology 400 returns to step 41 1, and data transfer 
continues by following the "True" branch to step 406, provided the connection has not 
failed. 

If the connection has failed, methodology 400 returns to step 405. 
Methodology 400 then loops between steps 405 and 407, where, in step 407 it is 
determined if a preselected amount of time has elapsed until, in step 405, the 
connection has become available. 

If, in step 407, the preselected time interval has elapsed, which may be 
referred to as "timing out," it is then determined, in step 425, if a preselected 
distribution lifetime has expired. If, in step 425, the distribution lifetime has not 
expired, then methodology 400 proceeds to step 404 to open a connection to an 
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alternative repeater, wherein methodology 400 then performs step 405 to determine 
the availability of a connection to the alternative repeater, in the same fashion as 
discussed above. Recall, as discussed hereinabove in conjunction with FIGURE 1, a 
network in accordance with an embodiment of the present invention may include a 
plurality of parallel repeater paths between the distribution source system and the 
target endpoints. However, in an alternative embodiment of the present invention, the 
network, may not implement parallel paths, and in such an alternative embodiment, in 
steps 404 and 405, the connection is retried to the same target. 

A method and apparatus for connection management which may be used to 
implement steps 404-415 is described in the commonly owned, co-pending U.S. 
Patent Application entitled "Apparatus for Connection Management and Method 
Therefor" (Attorney Docket No. AT9-99-655) incorporated herein by reference. 

If, however, in step 425 the distribution lifetime has expired, the distribution 
is aborted, step 419. In step 420, the status of the distribution, which is included in 
the results information associated with the distribution, is returned to the source 
system, or other preselected "report-to" systems which may include an endpoint 
system running the application requesting the distribution. 

Returning to step 405, if a connection is reestablished before the timeout 
period has elapsed, methodology 400 proceeds by the "Yes" branch to step 41 1 and 
then to step 406 to transfer additional data. The data transfer may resume at a 
preselected checkpoint wherein the storage of data, in step 408, is periodically 
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committed to a nonvolatile storage medium, for example a disk storage device 220, 
FIGURE 2. The transfer of data, in step 406, may then resume subsequent to the last 
permanently stored data. Such a data transfer using preselected checkpoints to 
resume interrupted data transfers is described in detail in the commonly owned, 
co-pending U.S. Patent Application entitled, "Apparatus for Reliably Restarting 
Interrupted Data Transfer and Method Therefor" (Attorney Docket No. AT9-99-276) 
incorporated herein by reference. 

Methodology 400 loops between steps 404-408, 41 1, 415, and 425 until the 
full distribution is transferred, or the distribution aborts in step 420. When the 
transfer completes as determined in step 415, methodology 400 then proceeds to 
step 409 and returns status information to one or more "report-to" machines as 
previously discussed in conjunction with step 420. The status may be sent to the 
endpoint requesting a distribution using the DID provided as described in conjunction 
with step 315, FIGURE 3, above. 

In step 4 1 0 it is determined if the current repeater is a gateway. If so, in 
step 412 a connection to the endpoints receiving the distribution is established. In 
step 414, it is determined if the connection is available. Again, as discussed 
hereinabove in conjunction with step 405, a preselected number of connections may 
be available in accordance with a priority scheme. If a connection is not available, 
methodology 400 proceeds through steps 414 and 418 until a connection is 
established, or in step 418, the distribution lifetime expires, wherein the distribution 
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aborts in step 419, and the distribution status is returned in step 420, as previously 
described. 

If, however, in step 414, a connection at the requested priority is available, 
while the connection is established, in step 417, data is transferred to the endpoint, 
step 416. Transfer continues, while, in step 422, the complete distribution has not 
been transferred, in which case the methodology 400 loops between steps 414, 417, 
418, 416 and 422. The aforementioned method and apparatus for connection 
management described is the co-pending, commonly owned U.S. Patent Application 
entitled "Apparatus for Connection Management and Method Therefor, incorporated 
herein by reference, may be used to implement steps 412-418. 

The data transfer in steps 416 may occur using the checkpoint process 
previously discussed in conjunction with the data transfer between repeaters, 
step 406, and described in detail in the co-pending, commonly owned U.S. Patent 
Applicant entitled "Apparatus for Data Depoting and Method Therefor," (Attorney 
Docket No. AT9-99-275) incorporated herein by reference. On completion of the 
transfer, step 422, status is returned in step 420 to one or more report-to systems. 

Methodology 400 may also be used to return distribution results information 
to the report-to systems. In using methodology 400 in this way, the report-to systems 
play the role of endpoint systems with respect to data distributions, and each endpoint 
receiving or repeater relaying the distribution data play the role of source systems. In 
this way, for example, log files generated by an installation program may be returned 



-20- 



AT9-99-274 PATENT 



to a preselected report-to system. Otherwise, as would be recognized by an artisan of 
ordinary skill, FIGURE 4 is unchanged. 

A methodology for opening connections in accordance with a preselected 
distribution priority level which may be used in conjunction with steps 404 and 405, 
and 412 and 414 of methodology 400 will now be described. FIGURE 5 illustrates, 
in flowchart form, a methodology 500 for performing a session availability 
determination in accordance with an embodiment of the present invention. 

In step 505, a session request is received, for example, from methodology 400, 
FIGURE 4 when opening a connection. A particular distribution in an embodiment 
of the present invention may be assigned one of three priority levels, low, medium or 
high, which determines the order in which the distribution is handled by a repeater. 
Distributions with higher priority levels are handled before those with lower priority, 
and distributions with the same priority level are handled in the order in which they 
are received by the repeater. The priority level may be set by an application 
requesting the distribution. A default priority may be set at the source repeater when 
it receives a distribution request from an application. 

In step 518, it is determined if the distribution has a high priority level. If not, 
then in step 520, it is determined if the distribution has a medium priority level. If 
not, then the distribution has a low priority, step 530 and, in step 535, it is determined 
if a session is available in the low-priority pool. If low priority session is available, 
then in step 550, methodology 500 signals that a connection is available. In an 
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embodiment of the present invention in accordance with methodology 400, 
FIGURES 4A and 4B, the information from step 550 may be received in steps 405 
and 414 in response to the opening of connections in steps 404 and 412, respectively. 
Conversely, if no low priority sessions are available in step 535, in step 540 

5 methodology 500 signals that no session is available. 

Returning to step 5 10, if the data transfer is determined to be a high priority 
transfer, then in step 5 1 5 it is determined if a high priority session is available. If so, 
then methodology 500 proceeds to step 550. Otherwise, if high priority sessions are 
unavailable, that is, fully used by other distributions, then in step 525 it is determined 

0 if a medium priority session is available. Again, if a medium priority session is 

available, then step 550 is performed; otherwise, in step 535, it is determined if a low 
priority level session is available. If so, step 535, then step 550 is performed; 
otherwise, no sessions are available and methodology 500 proceeds to step 540. 

Similarly, if in step 510, it has been determined that the distribution is not a 

5 high priority distribution, it is determined if the distribution has a medium priority, 

step 520. If not, it must again be a low priority distribution, step 530, previously 
described. Otherwise, in step 520 it is a medium priority distribution, and in step 525 
it is determined if a medium priority session is available. As before, if no medium 
priority sessions are available, it is determined if a low priority session is available, 

0 step 535. In this manner, a data distribution with a given priority level can use the 

number of sessions reserved for its priority level plus any sessions allocated to lower 
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priority levels. If no sessions are available at the assigned priority or lower, the 
methodology signals no available sessions, step 540, as previously described. 

Methodology 500 overcomes the tasking conflicts that might arise when a 
repeater is processing a lower priority distribution and a higher priority distribution is 
received. Without the ability to specify priority sessions, the high priority distribution 
would have to wait until the lower priority distribution was complete, or alternatively, 
the lower priority distribution be interrupted, causing inefficiencies within the 
distribution system, by having to subsequently re-distribute the lower priority 
distribution. 

As discussed hereinabove, in conjunction with, for example, FIGURE 4, 
results reports are generated by repeaters and endpoints reporting status of the transfer 
of a data distribution. Results are sent to one or more "report-to" machines, which 
may include the distribution source system. Results are stored in the distribution 
database, such as database 120 in FIGURE 1, maintained by the distribution manager, 
and may be entered into the database in conjunction with methodology 600, 
FIGURE 6. In step 601, methodology 600 loops until results are received. Results 
received are correlated in step 610 using the DID, which may be provided in 
accordance with methodology 300 for processing distribution requests, FIGURE 3. 
Recall that distributions may be sent to a plurality of endpoints, each of which may be 
reached via a different sequence of repeaters, as discussed hereinabove in conjunction 
with FIGURE 1 . Thus, results for a particular distribution may be forwarded to the 
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"report-to" machines from several endpoints at different times. The DID, which is 
unique to the particular distribution, then allows the results associated with that 
distribution to be correlated for updating the database entry corresponding to the 
distribution, in step 630. In this way, the distribution manager maintains a current 
status for each distribution. 

To avoid indefinitely filling the database, distribution records can be deleted. 
Automatic deletion may be provided after the expiration of a preselected record 
lifetime. In step 635, if the entry lifetime has expired, the corresponding entry is 
deleted in step 650. Even if, however, the entry lifetime has not expired, the entry 
may be deleted manually. In step 640, it is determined if a system administrator has 
initiated a command to delete an entry. If so, the entry is, again, deleted in step 650. 
Otherwise, methodology 600 returns to step 601 to continue to receive distribution 
results. 

The foregoing has outlined rather broadly the features and technical 
advantages of the present invention in order that the detailed description of the 
invention that follows may be better understood. Additional features and advantages 
of the invention will be described hereinafter which form the subject of the claims of 
the invention. 
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WHAT IS CLAIMED IS: 

1 1 . A data processing system for bulk data transfer comprising: 

2 a source data processing system for distributing data to one or more target data 

3 processing systems; 

4 one or more fan-out nodes for transferring said data between said source 

5 system and each of said one or more target data processing systems and transferring 

6 result information between said one or more target data processing systems and a 

7 preselected set of one or more data processing systems for managing data 

8 distributions. 

1 2. The system of Claim 1 wherein each of said one or more fan-out nodes is 

2 operable for caching at least a portion of a data distribution and at least a portion of 

3 said result information. 

1 3 . The system of Claim 1 wherein a data distribution has a preselected priority, 

2 said preselected priority operable for determining an availability of resources for said 

3 transferring of said data and said transferring of said result information. 

1 4. The system of Claim 1 wherein said one or more fan-out nodes comprises a 

2 plurality of fan-out nodes, and wherein said transferring of said data comprises: 

3 receiving said data from said source data processing system by a first fan-out 

4 node; 
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5 sending said data to a second fan-out node; and 

6 sending said data from said second fan-out node to one or more of said target 

7 data processing systems. 

1 5. The system of Claim 1 wherein source data processing system distributes said 

2 data in response to a request from at least one of said target data processing systems. 

1 6. The system of Claim 5 wherein a preselected one of said one or more data 

2 processing systems for managing data distributions enqueues said request in a 

3 database. 

1 7. The system of Claim 6 wherein said request comprises: 

2 a list of target data processing systems to receive the data; 

3 an identifier of a method by which the target machines will receive and 

4 process the data; and 

5 an identifier of a notification method by which said result information from 

6 each endpoint system will be received by said preselected set of one or more data 

7 processing systems for managing data distributions. 

1 8. The system of Claim 6 wherein said request is assigned a preselected 

2 distribution priority and said request is enqueued in accordance with said preselected 

3 distribution priority. 
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1 9. A method for distributing data comprising the steps of: 

2 transferring said data via a first set of one or more fan-out nodes to one or 

3 more endpoint systems; and 

4 transferring results information via a second set of said one or more fan-out 

5 nodes from said one or more endpoint systems to a preselected set of one or more data 

6 processing systems for managing data distributions, said results information 

7 generated in response to said step of transferring said data. 

1 10. The method of Claim 9 wherein each of said one or more fan-out nodes is 

2 operable for caching at least a portion of a data distribution and at least a portion of 

3 said result information. 

1 11. The method of Claim 9 wherein said step of transferring said data is 

2 performed in response to a request received from at application on at least one of said 

3 plurality of endpoints. 

1 12. The method of claim 1 1 wherein said request includes; 

2 a list of target data processing systems to receive the data; 

3 an identifier of a method by which the target machines will receive and 

4 process the data; and 

5 an identifier of a notification method by which said result information from 

6 each endpoint system will be received by said preselected set of one or more data 

7 processing systems for managing data distributions. 
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1 13. The method of Claim 1 0 further comprising the steps of: 

2 assigning one of a preselected set of priority values to each data distribution; 

3 and 

4 determining an availability of a network connection for said step of 

5 transferring said data in response to said one of said preselected set of priority values. 

1 14. The method of Claim 1 1 further comprising the step of determining an 

2 availability of a network connection for said transferring of results information in 

3 response to said one of said preselected set of priority values. 

1 15. The method of Claim 1 3 further comprising the steps of: 

2 assigning a distribution lifetime value to each data distribution; and 

3 aborting said step of transferring said data in response to an unavailability of 

4 said connection for a time interval corresponding to said distribution lifetime. 
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1 16. A computer program product embodied in a machine readable storage 

2 medium, the program product including programming for distributing data 

3 comprising instructions for: 

4 transferring said data via a first set of one or more fan-out nodes to one or 

5 more endpoint systems; and 

6 transferring results information via a second set of said one or more fan-out 

7 nodes from said one or more endpoint systems to a preselected set of one or more data 

8 processing systems for managing data distributions, said results information 

9 generated in response to said step of transferring said data. 

1 17. The program product of Claim 1 6 wherein each of said one or more fan-out 

2 nodes is operable for caching at least a portion of a data distribution and at least a 

3 portion of said result information. 

1 18. The program product of Claim 1 6 wherein said instructions for transferring 

2 said data are peiformed in response to a request received from at application on at 

3 least one of said plurality of endpoints. 

1 19. The program product of claim 1 8 wherein said request includes : 

2 a list of target data processing systems to receive the data; 

3 an identifier of a method by which the target machines will receive and 

4 process the data; and 
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5 an identifier of a notification method by which said result information from 

6 each endpoint system will be received by said preselected set of one or more data 

7 processing systems for managing data distributions. 

1 20. The program product of Claim 1 7 further comprising instruction for: 

2 assigning one of a preselected set of priority values to each data distribution; 

3 and 

4 determining an availability of a network connection for said step of 

5 transferring said data in response to said one of said preselected set of priority values. 

1 21. The program product of Claim 1 8 further comprising instructions for 

2 determining an availability of a network connection for said transferring of results 

3 information in response to said one of said preselected set of priority values. 

1 22. The program product of Claim 20 further comprising instructions for: 

2 assigning a distribution lifetime value to each data distribution; and 

3 aborting said step of transferring said data in response to an unavailability of 

4 said connection for a time interval corresponding to said distribution lifetime. 
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AN APPARATUS AND SERVICE FOR DISTRIBUTING AND 
COLLECTING BULK DATA BETWEEN A LARGE NUMBER OF MACHINES 

ABSTRACT OF THE DISCLOSURE 

A method that, all-in-one, allows applications to distribute asynchronously 
large amounts of data from a source node to multiple destination nodes, to process 
that data on each single node and to collect the results of that processing on one or 
more report-to nodes. Distributions are given levels of priority that determine the 
order in which they are handled by repeaters. A distribution with a given priority can 
use the number of sessions reserved for its priority level plus any sessions allocated 
for lower priority levels. Distributions are enqueues in a persistent queue, according 
to its priority, for subsequent distribution and immediately returns to the caller an ID 
that can be used as a correlator for the results. 

::ODMA\PCDOCS\DALLAS_1\3181589\10 
233:7047-P318US 
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PATENT APPLICATION 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my 

name; 

I believe I am the original, first and sole inventor (if only one name is listed 
below) or an original, first and joint inventor (if plural names are listed below) of the 
subject matter which is claimed and for which a patent is sought on the invention entitled 

AN APPARATUS AND METHOD FOR DISTRIBUTING AND 
COLLECTING BULK DATA BETWEEN A LARGE NUMBER OF MACHINES 

the specification of which (check one) 

ia is attached hereto. 

□ was filed on 

as Application Serial No. 

and was amended on 



I hereby state that I have reviewed and understand the contents of the above identified 
specification, including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information which is material to the patentability of 
this application in accordance with Title 37, Code of Federal Regulations, §1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code, §119 of any 
foreign application(s) for patent or inventor's certificate listed below and have also 
identified below any foreign application for patent or inventor's certificate having a filing 
date before that of the application on which priority is claimed: 

Prior Foreign Application(s): Priority Claimed 

□ Yes □ No 

(Number) (Country) (Day/Month/Year) 
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I hereby claim the benefit under Title 35, United States Code, §120 of any United States 
application(s) listed below and, insofar as the subject matter of each of the claims of this 
application is not disclosed in the prior United States application in the manner provided 
by the first paragraph of Title 35, United States Code, §1 12, 1 acknowledge the duty to 
disclose information material to the patentability of this application as defined in Title 
37, Code of Federal Regulations, §1.56 which occurred between the filing date of the 
prior application and the national or PCT international filing date of this application: 



(Application Serial #) (Filing Date) (Status) 

I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that 
these statements were made with the knowledge that willful false statements and the like 
so made are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 
of the United States Code and that such willful false statements may jeopardize the 
validity of the application or any patent issued thereon. 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following 
attorneys and/or agents to prosecute this application and transact all business in the Patent 
and Trademark Office connected therewith. 

John W. Henderson, Jr., Reg. No. 26,907; James H. Barksdale, Jr., Reg. No. 24,091; 
Thomas E. Tyson, Reg. No. 28,543; Robert M. Carwell, Reg. No. 28,499; Jeffrey S. 
LaBaw, Reg. No. 31,633; Douglas H. Lefeve, Reg. No. 26,193; Casimer K. Salys, Reg. 
No. 28,900; David A. Mims, Jr., Reg. No. 32,708; Mark E. McBurney, Reg. No. 33,114; 
Anthony V. S. England, Reg. No. 35,129; Volel Emile, Reg. No. 39,969; Christopher A. 
Hughes, Reg. No. 26,914; Edward A. Pennington, Reg. No. 32,588; JohnE. Hoel, Reg. 
No. 26,279; Joseph C. Redmond, Jr., Reg. No. 18,753; Leslie A. Van Leeuwen, Reg. No. 
42,196; Marilyn S. Dawkins, Reg. No. 31,140; Kelly K. Kordzik, Reg. No. 36,571; Baixy 
S. Newberger, Reg. No. 41,527; Ross S. Garsson, Reg. No. 38,150; and Bill R. Naifeh, 
Reg. No. P 44,962. 

Send correspondence to: James J. Murphy, 5400 Renaissance Tower, 1201 Elm Street, 
Dallas, Texas 75270-2199, and direct all telephone calls to Kelly K. Kordzik at (512) ? 
370-2851. 
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